An  RRT-Based  Algorithm  for  Testing  and 
Validating  Multi-Robot  Controllers 


Jongwoo  Kim 
GRASP  Laboratory 
University  of  Pennsylvania 
Philadelphia,  PA  19104 
Email:  jwk@grasp.cis.upenn.edu 


Joel  M.  Esposito 
Weapons  and  Systems  Engineering 
US  Naval  Academy,  MD  21402 
Email:  esposito@usna.edu 


Vijay  Kumar 
GRASP  Laboratory 
University  of  Pennsylvania 
Philadelphia,  PA  19104 
Email:  kumar@grasp.cis.upenn.edu 


Abstract —  We  address  the  problem  of  testing  complex  reactive 
control  systems  and  validating  the  effectiveness  of  multi-agent 
controllers.  Testing  and  validation  involve  searching  for  condi¬ 
tions  that  lead  to  system  failure  by  exploring  all  adversarial 
inputs  and  disturbances  for  errant  trajectories.  This  problem  of 
testing  is  related  to  motion  planning,  with  one  main  difference. 
Unlike  motion  planning  problems,  systems  are  typically  not 
controllable  with  respect  to  disturbances  or  adversarial  inputs 
and  therefore,  the  reachable  set  of  states  is  a  small  subset  of 
the  entire  state  space.  In  both  cases  however,  there  is  a  goal  or 
specification  set  consisting  of  a  set  of  points  in  state  space  that  is 
of  interest,  either  for  demonstrating  failure  or  for  validation. 

In  this  paper  we  consider  the  application  of  the  Rapidly- 
exploring  Random  Tree  algorithm  to  the  testing  and  validation 
problem.  Because  of  the  differences  between  testing  and  motion 
planning,  we  propose  three  modifications  to  the  original  RRT 
algorithm.  First,  we  Introduce  a  new  distance  function  which 
incorporates  information  about  the  system’s  dynamics  to  select 
nodes  for  extension.  Second,  we  Introduce  a  weighting  to  penalize 
nodes  which  are  repeatedly  selected  but  fail  to  extend.  Third, 
we  propose  a  scheme  for  adaptively  modifying  the  sampling 
probability  distribution  based  on  tree  growth.  We  demonstrate 
the  application  of  the  algorithm  via  three  simple  and  one 
large  scale  example  and  provide  computational  statistics.  Our 
algorithms  are  applicable  beyond  the  testing  problem  to  motion 
planning  for  systems  that  are  not  small  time  locally  controllable. 

1.  Introduction 

As  the  use  of  logic-based  or  reactive  control  laws  grows  in 
both  robotics  and  other  fields,  so  does  the  need  for  automated 
design  and  analysis  tools.  The  focus  to  date  in  the  auto¬ 
mated  safety  verification  literature  has  been  on  the  solution 
of  the  reachability  problem,  initially  through  symbolic  meth¬ 
ods  (e.g.,  [11],  [18])  and  later  through  numerical  techniques 
(e.g.,  [6],  [17]).  However,  the  class  of  systems  for  which 
the  reachability  problem  is  tractable  is  quite  limited  in  both 
expressiveness  and  dimensionality.  An  alternative  approach 
to  exhaustively  proving  safety  is  to  simply  search  for  a 
single  counter  example  -  a  series  of  inputs,  disturbances  or 
parameters  that  causes  a  system  to  fail.  We  term  this  semi¬ 
decision  approach  the  Testing  Problem. 

Inspired  by  the  connections  between  the  Testing  Problem  for 
complex  control  systems  and  the  Motion  Planning  problem, 
we  have  recently  applied  the  Rapidly-exploring  Random  Tree 
(RRT)  algorithm  [13]  to  the  testing  problem  [1],  [7]  with 


considerable  success.  RRT  algorithm  is  an  incremental  search¬ 
ing  algorithm  which  explores  state  space  fast  and  uniformly. 
However,  the  two  problems  are  different.  Perhaps  the  most 
significant  difference  between  the  two  problems  lies  in  the 
nature  of  the  system  dynamics  in  each  case.  Robotic  systems 
are  almost  always  controllable  (by  design),  so  the  reachable 
space  is  often  the  entire  free  space.  With  the  exception  of 
any  workspace  obstacles,  whose  configurations  are  known 
in  advance,  the  tree  can  be  expected  to  extend  to  fill  the 
entire  state  space.  On  the  other  hand,  when  we  test  complex 
control  systems,  it  is  frequently  with  respect  to  disturbances 
or  adversarial  inputs.  These  systems  are  frequently  not  con¬ 
trollable  with  respect  to  disturbances  or  adversarial  inputs  — 
in  fact,  the  reachable  set  is  usually  a  tiny  fraction  of  the 
entire  state  space.  In  such  systems,  the  traditional  uniform 
sampling  distribution,  combined  with  the  inherent  Voronoi  bias 
of  the  RRT  algorithm,  leads  to  a  slow  reduction  (improvement) 
in  dispersion  (coverage).  The  issue  is  not  easily  remedied 
because,  unlike  C-space  obstacles,  the  reachable  set  is  not  a 
priori  known. 

Accordingly,  we  propose  three  modifications  to  the  original 
RRT  algorithm.  Eirst,  we  develop  a  new  distance  function 
which  encodes  local  information  about  the  system’s  dynamic 
constraints  with  a  first  order  approximation.  Second,  because 
the  reachable  state  space  is  generally  a  small  fraction  of 
the  total  state  space,  we  introduce  a  weighting  factor  which 
penalizes  the  repeated  extension  of  boundary  nodes.  Einally, 
we  propose  a  scheme  for  adaptively  modifying  the  sampling 
probability  distribution  between  the  traditional  uniform  distri¬ 
bution  and  heavily  biased  toward  the  specification  set  based 
on  tree  growth. 

The  paper  is  organized  as  follows.  In  Section  II-A  we 
formally  define  the  testing  problem.  Section  II-B  reviews  the 
original  RRT  algorithm  and  reviews  the  most  relevant  litera¬ 
ture.  Section  III  examines  three  key  features  of  the  traditional 
RRT  algorithm  which  are  troublesome  for  testing  problems; 
proposes  methods  to  remedy  them  and  presents  simple  illus¬ 
trative  examples,  complete  with  comparative  computational 
statistics.  A  new  algorithm  unifying  the  enhancements  is 
presented  in  Section  IV.  The  algorithm  is  used  to  solve  a  multi¬ 
agent  pursuit-evasion  problem  and  performance  statistics  are 
discussed.  Concluding  remarks  follow  in  Section  V. 
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B.  Related  Work 


A.  Problem  Statement 

Definition  2.1:  We  define  a  Finite  Time  Control  System 
as  a  tuple  C  =  {X,  U,  T,  Init,  /)  where 

•  X  C  K."  is  a  set  of  free  state  variables; 

•  U  C  M™  is  a  compact  set  of  input  values; 

•  T  =  [fg,  C  R  is  a  compact  time  interval  the  system 
evolves  over; 

•  Init  C  X  is  a  compact  set  of  possible  initial  conditions; 

•  f  :  X  X  U  ^  R”  is  a  vector  field  which  prescribes  the 
time  derivative  of  the  state  variables. 

We  are  generally  interested  in  systems  with  collections 
of  rigid  bodies  with  very  complicated  dynamics,  espe¬ 
cially  high-dimensional  continuous  systems  or  hybrid  (dis¬ 
crete/continuous)  and  switched  systems  where  /  may  be  a 
non-smooth  function  of  x.  We  do  not  impose  any  structure 
on  the  nature  of  the  dynamics  (except  assuming  that  solutions 
exist  in  the  sense  of  Fillipov).  Note  that  in  the  case  of  rigid 
body  systems  X  is  essentially  C/^ee  n  TCfree-  We  use  the 
term  “input”  in  the  most  general  sense  in  that  it  can  include 
yet  unspecified  feedback  control  inputs,  human  in  the  loop 
inputs,  and  disturbances. 

Problem  2.2:  Testing  Problem;  Given  a  tuple 
where 

•  C  =  {X,  U,  T,  Init,  /)  is  a  finite  time  control  system, 

•  S  Init,  and 

•  S'  is  a  specification  set, 

the  goal  is  to  determine  an  open  loop  control  law  U  :  T  ^  U 
such  that  3f  G  T  for  which  x{t)  G  S. 

In  other  words,  the  goal  is  to  determine  a  counter-example 
-  an  input  sequence  which  will  cause  the  system  to  fail  by 
entering  S  -  if  one  exists.  However,  in  order  to  make  the 
problem  algorithmically  tractable,  instead  of  searching  the  set 
of  all  possible  functions  U  :  T  ^  U,  the  search  must  be 
restricted  to  some  subset  of  functions  with  finite  dimensional 
parametrization. 

For  the  sake  of  convenience  we  make  three  additional 
assumptions.  First,  assume  X  C  R"  is  defined  in  such  a  way 
that  a  point  in  R"  can  be  easily  tested  for  membership  in  X. 
Second,  assume  the  specification  set  S  can  be  defined  as  the 
sub-level  set  of  some  function  S  =  {x\x  G  X,s{x)  <  0}. 
Finally,  we  restrict  our  search  over  U  to  piecewise  constant 
functions  of  time  with  k  segments,  each  of  time  duration  6t. 
Thus,  instead  of  the  continuous  map  U,  we  consider  the  search 
over  U  :  T  ^  U,  as  the  search  for  a  k-vector  of  parameters. 
With  M*  G  [/ 


We  base  our  approach  on  the  Rapidly-exploring  Random 
Tree  (RRT)  algorithm  [13].  A  very  basic  algorithm  is  given  in 
Algorithms  1  and  2,  where  p  is  some  suitable  metric  and  pdf 
is  a  probability  distribution.  RRTs  are  attractive  because  they 
work  directly  in  the  space  of  admissible  inputs  making  them 
suitable  for  systems  with  dynamic  constraints  and  because 
they  are  probabilistically  complete  [13].  While  much  work 
on  safety  verification  exists,  the  approach  of  using  RRTs  to 
analyze  hybrid  systems  is  recent.  In  [8]  RRTs  were  used 
to  design  trajectories  of  hybrid  systems.  The  first  published 
work  using  RRTs  for  analyzing  hybrid  systems  is  [3],  [15]. 
In  a  similar  vein,  a  blimp  system  control  law  was  validated 
under  unpredictable  but  bounded  disturbances  [10].  In  [2], 
the  reachable  set  for  aircraft  collision  avoidance  problem  was 
obtained  and  several  extensions  of  the  RRT  approach  were 
mentioned.  We  have  applied  a  variant  of  this  method  [1]  to 
testing  hypotheses  and  establishing  properties  of  biological 
networks. 

We  review  two  developments  from  [7],  used  later  in  this 
paper:  coverage  estimation  and  the  RRFT  algorithm.  First,  the 
coverage  of  the  state  space  with  tree  nodes  is  important  both 
because  it  can  be  used  as  a  termination  criteria  in  the  event  a 
solution  is  not  found  and  because  it  provides  a  methodology 
for  comparing  the  effectiveness  of  two  algorithms  that  is 
not  dependent  on  the  goal  position.  Typically  Dispersion  is 
used  [12]  which  is  loosely  defined  as  the  radius  of  the  largest 
ball  in  X  which  does  not  contain  a  tree  node.  We  reject  it 
on  the  grounds  that  it  is  difficult  to  compute  and  because, 
by  only  focusing  on  the  largest  such  ball,  it  yields  an  overly 
conservative  estimate  of  coverage.  We  introduce  a  coverage 
measure  which  can  be  thought  of  as  a  discretized  average 
dispersion.  Given  an  RRT  (T)  and  a  set  of  grid  points  G  C  X 
with  spacing  6x 


:=(T,G)  =  1- 


\G\-5x 


^  wl\tl{p{x^ ,T),5x). 


Second,  the  Rapidly-exploring  Random  Forest  of  Trees 
(RRFT)  algorithm  searches  over  time  invariant  parameters  and 
initial  conditions  by  planting  many  RRTs  at  a  sampling  of 
parameter  values.  Individual  trees  are  grown  and  terminated 
by  monitoring  c.  Both  c  and  the  RRFT  method  are  used  in 
Sect.  IV. 


Algorithm  1  Generate  RRT;  T 
Initialize  RRT;  T.addVertex(a;°) 
while  ^x  G  T  such  that  s{x)  <  0  do 
Extend(T) 

end  while 


so  the  input  u(t)  is  given  by 

u{t)  =  €U  if  to  +  {i  —  l)(5f  <t<to  +  {i)6t 

for  f  =  1, . . . , fc. 


There  have  been  several  enhancements  to  the  basic  RRT 
algorithm.  In  [5]  a  method  for  penalizing  the  repeated  selection 
of  collision  prone  nodes  for  extension  is  introduced.  In  [16]  a 
node  selection  strategy  is  described  which  increases  the  natural 
Voronoi  bias  of  the  method  for  the  purposes  of  dispersion 


Fig.  1.  The  reachable  space  is  shown  as  the  shaded  gray  region.  The 
arrows  indicate  the  possible  velocity  vectors  at  each  node.  Nodes  selected 
for  extension  on  the  basis  of  their  distance  from  may  be  difficult 

to  connect  from  when  the  system  is  not  small  time  locally  controllable, 
is  a  better  candidate  for  extension. 


Algorithm  2  Extend(T) 

^rand  g  ^  ^  pdf() 

^near  ^  arg  miiiajj  gr 

^new  ^  ^near  J"5t 

T.addVertex(a:’"®“') 
T.addEdge(M’"™,  ^ 


reduction.  However,  neither  approach  is  able  to  reduce  the 
dispersion  (which  must  be  measured  within  the  reachable 
set)  for  uncontrollable  systems.  Biasing  the  sampling  toward 
regions  close  to  the  goal  state  has  been  tried  in  [14],  [15] 
and  [3]  with  some  success.  However  the  sample  bias  factor 
is  fixed  a  priori  and  it  can  lead  to  difficulties  in  non-convex 
systems  because  of  the  presence  of  local  minima.  In  [10],  a 
metric  accounting  for  under-actuated  dynamics  is  suggested 
but  is  specific  to  the  aerial  robots  example  considered  there. 

III.  Enhancements  to  the  RRT  Algorithm 
In  this  section,  we  propose  three  modihcations  to  the 
original  RRT  algorithm,  all  designed  to  deal  with  systems  that 
may  only  traverse  a  small  fraction  of  the  entire  state  space  and 
in  which  there  are  no  obvious  metrics  to  establish  proximity 
relationships.  Recall  that  the  Voronoi  bias  coupled  with  the 
use  of  a  uniform  distribution  decreases  the  dispersion  of  the 
tree  nodes  in  X.  However,  for  uncontrollable  systems  it  may 
be  impossible  to  reduce  the  dispersion  of  the  tree  nodes  in  X 
below  a  critical  value,  which  is  an  unknown  constant.  Instead 
the  goal  is  to  simply  find  a  solution  quickly  while  reducing  the 
dispersion  of  the  tree  nodes  within  the  reachable  space,  TZ,  by 
using  heuristics  to  account  for  the  system’s  motion  constraints. 

A.  Dynamics-based  selection  of  proximal  node 
Example  3.1:  Consider  the  trivial  example 

Xl  =2,  X2  =  u,  (III-l) 

where  u  €  U  =  [1,2],  The  reachable  space,  which  is  normally 
unknown  can  easily  be  computed  by  hand  in  this  case,  and 
is  shown  as  the  shaded  region  in  Fig.  1.  A  state  2;’’“”'^  is 
generated  and  the  planner  must  select  the  “closest”  tree  node, 
^near  jq  attempt  to  connect  from.  Line  2  of  Algorithm  2 
(traditional  RRT)  selects  <—  x^  for  extension  based  on 


proximity  to  g  X,  as  determined  by  a  distance  metric 

p  that  is  implicitly  assumed  to  be  a  Euclidean  metric. 

However,  none  of  the  possible  velocity  vectors  at  that 
state  (indicated  as  region  between  the  thick  arrows)  are  able 
to  proceed  in  the  required  direction.  Despite  the  fact  that 
>  p{x^ ,  x'^°'^'^),  x^  is  actually  more  suited 
to  extension  because  the  possible  velocity  vectors  include  a 
direction  that  moves  toward  In  addition  to  testing  prob¬ 

lems,  this  situation  arises  in  a  variety  of  robotic  applications 
where  the  system  is  nonholonomic  {e.g.,  wheeled  carts),  and 
particularly  in  systems  with  constraints  on  forward  velocities 
(e.g.,  unmanned  aerial  vehicles).  Ideally  both  distance  and 
velocity  constraints  should  be  used  to  estimate  a  “time  to 
connection”. 

To  remedy  the  situation  in  Fig.  1  we  propose  replacing 
p{xf  x‘^°'‘^‘^)  in  Line  2  of  Algorithm  2  with  a  local  first  order 
approximation  of  the  time-to-go. 


l'2go(t^ 


„rand 


)  = 


p{x^ 


X 

(X) 


rand 


)/g  if  g>0 
if  9  <0 


(111.2) 


where  g  represents  the  instantaneous  speed  with  which  2;’’“”'^ 
can  be  approached 


q  =  max 
ueu 


dx 


f  {x,  u)  la;  — 


Intuitively  t2go  computes  the  distance  from  x^  to  2:’’“”^^  and 
divides  by  a  first  order  approximation  of  the  speed  with  which 
the  distance  can  be  decreased,  giving  t2go  units  of  time.  Note 
that  a  negative  value  of  g  implies  that  the  distance  is  actually 
increasing,  which  can  be  interpreted  as  inhnite  “time-to-go” 
(to  hrst  order).  In  a  given  iteration  if  none  of  the  existing 
nodes  have  a  finite  value  for  t2go,  one  can  be  chosen  at 
random  or  based  on  some  secondary  criteria  (such  as  distance 
as  determined  by  p). 

From  a  computational  point  of  view  the  maximization  may 
be  done  by  exhaustive  search  or  by  exploiting  some  problem 
dependent  feature.  For  example  if  f{x,  u)  is  an  affine  function 
of  u  and  the  set  U  is  the  Cartesian  product  of  rectangles,  the 
maximization  is  a  linear  program  in  n  dimensions  which  can 
be  solved  efficiently.  If  no  efficient  methods  exist  to  compute 
this  quantity,  evaluating  every  node  via  this  method  can  be 
intensive.  In  such  a  case,  f2go  can  be  used  as  a  secondary 
criteria  to  select  2;"®“’'  among  the,  for  example,  10  closest 
nodes  according  to  the  Euclidean  metric. 

We  next  consider  an  example  that  is  from  the  verihcation 
community.  Although  it  is  not  central  to  robotics,  it  has 
many  of  the  properties  that  are  central  to  multi-agent  robotic 
systems. 

Example  3.2:  The  hybrid  automata  model  of  a  thermostat 
has  been  a  popular  example  in  the  verification  literature  [9]. 
Fig.  2  shows  the  system  model,  x  =  (xi,X2,X3)  £  AT  C 
where  xi  is  the  temperature  in  the  room,  X2  is  the  elapsed 
time,  and  2:3  is  a  timer  that  measures  the  cumulative  amount 
of  time  the  heater  has  been  on  for.  The  dynamics  have  two 
modes  which  denote  the  heater  being  “on”  or  “off’.  U  consists 
of  Uon  =  [2,4];  and  Mq//  =  [~3j“l]-  The  values  Uon 


jr,<l 


on:  q  =  1 

q  =  2 

/  -^1  =  =  [2.4]  \ 

1  -V.  =  1  J 

1  1  i,  =  1  ) 

y 

=0  y 

x,>3 

Fig.  2.  The  system  dynamics  of  the  thermostat. 


Fig.  3.  The  solution  of  the  thermostat  counter  example  via  the  RRT  using 
the  dynamics-based  selection  of  proximal  nodes  (Temperature  vs.  time). 

Uoff  represent  the  possible  heating  and  cooling  rates  in  the 
two  modes.  The  conditions  Xx  <  1  and  Xi  >  Z  enable  the 
mode  switches  off  on  and  on  off  respectively.  In  [9] 
a  symbolic  verification  tool  is  used  to  answer  the  question: 
“After  an  initialization  period  of  two  minutes,  is  it  possible 
for  the  heater  to  be  on  for  more  than  two  thirds  of  the  total 
time  at  any  point  during  the  first  hour  of  operation?”  Such  a 
question  may  be  important  from  an  energy  consumption  point 
of  view.  Therefore  the  specification  set  is 

S'  =  {x  G  X\2/Zx2  —  X3  <  0  a  —X2  +  2  <  0}. 

The  initial  conditions  were  mode  =  “on”,  and  x°  =  [2  0  0]^. 
Aside  from  being  a  classical  verification  example,  the  senario 
is  interesting  in  its  own  right.  First,  the  system  has  quite 
nontrivial  dynamics,  since  the  control  inputs  do  not  appear 
in  the  right  hand  side  of  two  of  the  state  equations,  or  the 
specification  equations.  This,  together  with  the  narrow  range 
of  U,  makes  the  reachable  set,  TZ  a  small  subset  of  X.  The 
set  of  possible  velocity  vectors  at  every  point  is  very  limited 
making  this  an  ideal  example  to  demonstrate  the  Dynamics- 
based  selection  of  proximal  node. 

First  the  problem  was  solved  10  times  selecting  proximal 
nodes  based  on  the  Euclidean  metric  p;  then  10  times  with 
the  Dynamics-based  selection  function  t2go-  In  all  cases,  the 
algorithm  successfully  computed  a  counter  example  as  seen 
in  Fig.  3.  Table  I  shows  the  computational  statistics  for  two 
algorithms. 

B.  History-based  selection  of  proximal  node 

A  second  situation  is  shown  in  Fig.  4  where  the  traditional 
RRT  is  applied  to  the  system  and,  after  8  iterations,  the 
resulting  tree  is  shown  using  dark  circles  and  line  segments. 
Because  the  reachable  set  is  so  small,  nodes  on  the  boundary 
will  tend  to  have  disproportionately  large  Voronoi  regions, 
such  as  x"^  in  Fig.  4.  When  a  uniform  distribution  is  used  to 
generate  most  samples  will  fall  outside  the  reachable 

set  and  these  boundary  nodes  will  be  selected  for  extension 


Metric 

No.  of 
Nodes 

Computation 
Time  (sec) 

Euclidean 

t2go 

2284 

1627 

376.4 

231 

lABLEl 

Thermostat  Example:  A  comparison  of  the  use  of  the  Euclidean 

METRIC  and  t2go  INTRODUCED  IN  SECT.  Ill- A,  AVERAGED  OVER  10 
TRIALS  ON  A  IGHZ  PC. 


Fig.  4.  The  reachable  space  is  shown  as  the  shaded  gray  region,  bold  circles 
and  lines  are  the  RRT,  and  dotted  lines  are  the  Voronoi  cells.  Nodes  on 
the  boundary  of  the  reachable  space  have  disproportionately  large  Voronoi 
regions,  causing  them  to  repeatedly  be  selected  as 

repeatedly.  Each  time,  the  same  extremal  inputs  will  be  used  to 
connect  x"^  to  in  vain,  instead  resulting  in  x^ .  Boundary 
nodes  which  are  repeatedly  selected  but  fail  to  extend  should 
be  penalized  to  counter  balance  this  Voronoi  bias  so  that  they 
are  less  likely  to  be  selected  in  the  future. 

If  a  node  is  selected  for  extention  as  in  Line  2  of 

Algorithm  2  and  the  minimization  in  Line  3  produces  an  input 
^new  yvJjjqJj  been  applied  previously,  the  resulting 
is  already  an  element  of  T.  When  this  happens  we  say  the 
node  has  “failed  to  extend”;  and  determine  the  next  best  u"®’" 
which  extends  the  tree  (suggested  in  [5]). 

Eor  each  x^  G  T  we  propose  storing  the  number  of  times 
the  node  has  failed  to  extend  nf  This  value  can  be  used  to 
compute  a  penalty  weight  to  discourage  the  repeated  selection 
of  boundary  nodes  which  fail  to  extend.  Let  n„iin  and  n„iax 
be  the  least  and  greatest  values  of  in  the  tree  at  a  given 
iteration.  The  History-based  weighting  is  defined  as 

H{x^  ^rand.^)  _  ^  j.rand'^  _  ^  ji3  _  rimin 

Pmax  Prciin  ^max  ^min 

(III.3) 

where  pmin  =  min^-igT- and  pmax  is  defined  in 
a  similar  manner.  These  bounds  are  used  to  normalize  the 
distances  so  that  the  impact  of  the  second  term  is  not  problem 
dependent.  Note  that  any  distance  function,  including  t2go  can 
be  substituted  for  p. 

Example  3.3:  The  RRT  algorithm  is  used  to  find  trajectories 
of  the  linear  dynamic  system  with  bounded  control  inputs  in 
the  form  of 

X  =  Ax  Bu  b  (III-4) 

where  x  G  X  =  [—200  200]  x  [—200  200]  and  u  G  U  = 
[-10  10]  X  [-10  10]. 

Lig.  5  shows  trajectories  generated  by  the  RRT  algorithm 
using  the  Euclidean  metric  (left)  and  using  the  History-based 
weighting  described  above  (right).  Note  that  reachable  set 


Fig.  5.  RRT  for  a  linear  system  using  the  Euclidean  metric  {left)  vs.  a  History- 
based  selection  of  proximal  node  {right).  After  5000  nodes  the  coverage  of 
the  reachable  space  is  much  more  dense  when  using  the  weighting. 


Fig.  6.  Value  of  for  each  node  (sorted  in  descending  order)  using  the 
unweighted  Euclidean  metric  {left)  and  History-based  weighting  {right). 

is  small  fraction  of  the  environment.  The  interior  of  the 
reachable  region  with  the  History-based  selection  of  proximal 
node  method  is  much  more  densely  covered  than  Euclidean 
metric.  Fig.  6  shows  for  each  node  in  T.  Nodes  are 
sorted  in  descending  order  to  facilitate  the  visualization.  In 
the  conventional  RRT  algorithm,  a  smaller  portion  of  nodes 
(on  the  boundary  of  the  reachable  set)  have  disproportionately 
high  values  of  nK 

C.  Adaptively  biased  sample  generation 

Intuitively,  biasing  the  sampling  distribution  for  ^rand  jq 
generate  a  disproportionate  number  of  samples  inside  the  set  S 
is  effective  when  the  system  is  easily  steered  toward  S  (i.e.  the 
system  is  output  controllable  with  respect  to  s{x)).  In  general, 
biasing  the  sample  distribution  toward  the  goal  can  make  sense 
but  it  is  difficult  to  decide  a  priori  which  problems  will  beneht. 

We  update  the  amount  of  biasing  for  every  iterations  of 
the  RRT  algorithm,  where  Ng  is  user  dehned  number.  If  in  a 
given  iteration  where  p  is 

a  metric  function,  we  call  such  an  iteration  successful  because 
the  tree  has  grown  toward  We  count  the  number  of 

successful  iterations  rig,  out  of  the  np  iterations  where  random 
states  are  generated  inside  the  set  dehned  by  s{x)  <  0  and 
compute 

/3  =  —  (III.5) 

np 

If  is  not  successful  in  growing  toward  inside  the 

specihcation  set  or  the  best  candidate  for  x"®*"  from  x"®“’' 
is  already  in  a  tree  in  the  above  test,  we  eliminate  the  x"®“’' 
from  consideration  as  x"®“’’  for  the  testing  in  future  iterations 
to  prevent  it  from  being  chosen  repeatedly.  Values  of  f3  close 
to  unity  indicate  biasing  sample  generation  inside  S  has  been 
benehcial. 


-10  -5  0  5  10 


Fig.  7.  The  distribution  B{x;  fi,  0)  with  fi  =  0  and  various  values  of  0. 

Our  proposed  probability  density  function  B{x;  p,  (3),  to  be 
used  in  Line  1  of  Algorithm  2,  resembles  a  Gaussian  over 
some  compact  set,  a  <  x  <  b 

(  N{x]p,a{P))  + 

B{x-,p,P)  =  l  Ct/{b  —  a),  a<x<b  (III.6) 
[  0  else 

where  N{x;  p,a{f3))  is  the  Gaussian  distribution  with  mean 
p  and  standard  deviation  a{/3).  The  last  term,  Ct/{b  —  a),  is 
added  to  ensure  that  the  area  under  the  curve  is  equal  to  one. 
Ct  represents  the  area  of  the  truncated  portions  above  x  =  b 
and  below  x  =  a 

/a  poo 

N{x;p,a)dx+  /  N{x]  p,a)dx. 

-oo  J  h 

Obviously  p  should  be  selected  so  that  s{p)  <  0.  The 
standard  deviation  of  iV(x;  p,  <j{P))  effectively  determines  the 
bias  and  should  be  computed  using  [3  £  [0, 1] 

Cr(/3)  =  (1  -  /3)(crniax  “  CTmin)  +  (HI.?) 

where  CTmax  and  (Tmin  are  user-dehned  values  of  the  maximum 
and  minimum  standard  deviations. 

Fig.  7  illustrates  the  shape  of  B{x;p,P)  with  different  val¬ 
ues  of  [3.  Distribution  (III. 6)  can  be  easily  implemented  using 
any  random  normal  generator  and  rejecting  points  generated 
outside  the  compact  domain. 

Example  3.4:  We  consider  a  hovercraft  in  constant  altitude 
flight  with  6  states,  x  =  (xi,X2,9,vi,V2,i-o).  The  dynamic 
equations  are 

mVi  =  {fl+ f2)cos{0)  +  fxiair{x,Vair{x)) 
mV2  =  ifl  +  /2)sin(0)  -f  fx2airix,Vairix)) 

Jdj  —  (y*2  (^5  Xair  (^)  ) 

The  control  inputs  are  u  =  [fi  /2]^  (forward  actuating 
forces)  and  U  =  [—10,10]  x  [—10,10].  Forces  due  to  wind 
disturbances  in  the  Xi,  X2  and  0  directions  are  fx^air.  fx2air, 
and  Tair  whose  exact  expressions  are  omitted  for  brevity  but 
are  quadratic  in  the  difference  between  the  craft’s  velocity 
and  the  wind  velocity  Vair  and  vary  with  the  orientation  of  the 
craft.  Note  that  the  state  is  partitioned  into  two  regions  (indoor 
and  outdoor)  which  dehne  the  wind  velocity  differently; 

_  /  [-ayX2  PvXi]'^,  s/ixi^  +  {X2y  <  100 

-  I  jQ  +  (x2)2  >  100  ' 


Fig.  8.  RRTs  of  the  hovercraft  problem  with  uniform  sampling  {left)  and 
with  adaptive  bias  (right). 


Fig.  9.  The  evolution  of  the  biasing  factor  (3  for  the  hovercraft  problem. 

We  would  like  to  determine  if  a  hovercraft  under  these 
wind  conditions  can  reach  some  goal  zone,  S  =  {{xi,X2)  £ 
[190,  200]  X  [0, 10]}.  Note  that  when  outdoors  the  wind  forces 
are  significantly  greater  in  magnitude  than  the  control  inputs, 
making  the  system  uncontrollable. 

The  initial  state  is  =  [0  0  0  0  0  0]^.  The  distribution 
(1II.6)  was  used  to  solve  the  problem  10  times  on  a  IGHz  PC. 
Fig.  8  shows  the  solutions  of  the  problem  with  the  uniform 
sampling  distribution  and  adaptive  bias.  Fig.  9  shows  how  [3 
changes  as  the  algorithm  evolves.  The  adaptive  algorithm  is 
able  to  exploit  the  situations  in  which  biasing  is  effective.  As 
shown  in  Table  II,  the  adaptive  biasing  algorithm  improves 
the  efficiency  of  RRT  method  compared  to  other  fixed  bias 
strategies  rather  dramatically. 


Sampling  Method 

No.  of  Nodes 

Computation  Time  (sec) 

Uniform 

3556 

1753.5 

Med.  Bias 

1017 

490.2 

Heavy  Bias 

912 

408.3 

Adaptive  Bias 

678 

342.5 

TABLH  11 

Hovercraft  Example:  A  comparison  of  the  sampling  strategy 

INTRODUCED  HERE  (ADAPTIVE  BIAS)  TO  FIXED-BIAS  SAMPLING 
STRATEGIES,  AVERAGED  OVER  10  TRIALS  ON  A  IGHZ  PC. 

IV.  Unified  Algorithm 
A.  Unified  algorithm 

Algorithm  3  and  4  present  the  unification  of  the  enhance¬ 
ments  presented  in  the  previous  section.  Note  that,  since 
most  robotic  problems  are  controllable,  the  Algorithm  1  can 
terminate  when  a  solution  is  found.  In  our  case,  it  is  a  distinct 
possibility  that  no  solution  exists  so  we  impose  a  secondary 
termination  criteria.  The  change  in  coverage  over  the  trailing 
N  iterations  Ac  ,  measures  the  growth  of  the  tree.  If  Ac  drops 
below  some  user-defined  Acmin  we  terminate  the  search. 


Algorithm  3  Generate  enhanced-RRT;  T 

Initialize  RRT:  T.addVertex(a:°  ^  <—  0) 

Global:  j3  =  1 

while  i^x  £  T  such  that  s{x)  <  0)  AND  Ac  >  Acmin  do 
enhanced-Extend(T) 

end  while 


Algorithm  4  enhanced-Extend(T) 

IT  =  (1  /?)(frniax  ^min)  “f 

^rand  g  ^  ^  B{x\  (3)  (see  eq.(III.6)  ) 

^near  ^  arg  min^-j  g-j- [iF  ,  0:’’“"''^;  f2go)]  (see 

eq.(III.2),(III.3)) 

^uew  ^  argmin„gp[f2go(cc™’^^a:’^“’’  +  f*  f{x,u)dt)] 
^new  ^  ^near  /(cc,  (f)  )df 

if  =x^  &T  then 

+  + 

U 

goto  compute 
end  if 

r.addVertex(a;"®“',  =  0) 

T.addEdge(M"®“,  ^  x"®™) 

reset  U 

if  Ng  iterations  then 
(3=^ 

end  if 


B.  A  Multiagent  Problem 

We  consider  a  problem  where  multiple  autonomous  vehicles 
must  guard  against  an  intruder  entering  a  designated  area.  This 
scenario  has  applications  in  games  such  as  “capture  the  flag” 
and  can  be  viewed  as  a  variant  of  the  art-gallery  problem. 
It  has  applications  in  homeland  security  where  autonomous 
vehicles  (boats,  airplanes,  ground  robots)  can  be  deployed  to 
detect  unidentified  vehicles  entering  a  cordoned-off  area  or  an 
exclusion  zone. 

In  this  example,  we  examine  a  circular  area,  Se  guarded  by 
4  robots.  Each  robot  has  sensor  foot  prints  which  are  assumed 
to  be  circular  with  radius  for  detection  and  R,.  for  capture, 
as  shown  in  Eig.  10.  The  guarding  scheme  is  shown  in  Eig.  11. 
Initially,  the  guard  robots  distribute  evenly  along  the  perimeter 
of  the  exclusion  zone.  If  the  intruder  enters  the  detection  range 
of  a  guard  robot,  the  robot  pursues  the  intruder  and  other 
robots  redistribute  evenly  along  the  circle  Ce-  If  the  intruder 
escapes  the  detection  range  of  the  pursuing  robot,  the  robot 
returns  to  the  perimeter  and  all  robots  redistribute  evenly.  The 
question  we  wish  to  answer  is  as  follows.  If  an  intruder  or  an 
adversary  is  allowed  to  start  anywhere  in  a  specified  region  Sj, 
and  the  guard  robots  are  evenly  distributed  on  the  circle  Ce, 
can  the  intruder  enter  the  exclusion  zone  {Se)  uncaptured?  The 
answer  to  our  question  can  only  be  found  by  searching  for  an 
initial  condition  and  a  control  input  function  for  the  intruder 
which  drives  it  into  the  exclusion  zone  without  crossing  any  of 
the  capture  ranges.  We  assume  each  of  the  intruder  and  guard 
robots  has  5  states,  x*  =  (x}, X2, 0*, x*, w*)  and  2  control 


inputs,  w®  =  {u\,U2)  where  and  indicate  states  and  input 
of  the  intruder.  The  dynamics  with  nonholonomic  constraints 
are  given  by: 

i\  =  v^cos{9^),  ±2  =  v''sin{6^),  0*  =  a;®  (IV 1) 
v'^  =  u\,  LO'^  =  u\. 

We  can  define  the  free  space  X  =  Xi  x  X2  x  •  •  •  x  X^  \ 
UL2  Bix^it),  Rc)  C  K25  where 

X,  =  {{x\,xl,9\v\io^)  G  R^\{x\f  +  <  Rj} 

B{x\t),Rc)  =  {{x\,xl)\{xl  -  x\f  +  {xl  -  x\f  <  RD- 

Then  the  specification  set  S  is  defined  by 

S  =  {x  €  X\{x\f  +  {x\f  <  Rl} 

where  Rg  is  the  radius  of  the  circle  Ce- 

To  evenly  distribute  guard  robots  along  the  perimeter,  we 
use  the  algorithm  proposed  in  [4].  Each  guard  robot  is  subject 
to  the  force 

=  -kV^'^iq^)  -  Cq^  +  ^  Friq^,q’^)  (IV.2) 

kGNj 

where  q^  =  {x{,X2)  ^  '^he  position  of  robot  j,  ijj  : 

^  R  is  an  implicit  function  description  of  the  perimeter 
of  the  exclusion  zone  that  must  be  guarded  and  Nj  is  the  set 
of  robots  neighboring  robot  j.  is  a  Coloumb-like  repulsive 
force  that  ensures  that  the  robots  do  not  cluster  together,  while 
C  is  a  constant  which  provides  a  viscous  damping  term.  The 
force  is  applied  to  a  point  that  is  at  a  finite  distance  away 
from  a  robot  to  address  nonholonomic  constraints.  A  detailed 
description  of  the  control  law  including  a  proof  of  convergence 
to  different  shapes  is  provided  in  [4].  However,  the  analysis 
in  the  paper  cannot  be  used  to  predict  the  transients  as  each 
guard  robot  moves  toward  the  perimeter. 

Note  that  the  reachable  set  of  states  in  Ai  is  a  small  subset  of 
the  entire  state  due  to  the  fact  that  the  system  is  uncontrollable 
and  U  is  bounded.  Finally,  note  that  the  intruder  can  start 
anywhere  in  the  set  Sj.  In  other  words,  the  initial  condition  for 
the  intruder  must  be  chosen  from  this  finite  set,  each  condition 
leading  to  a  RRT 

We  apply  the  RRFT  algorithm  with  enhancements  suggested 
in  Sec.  IV-A  to  the  problem.  The  control  inputs  are  u  — 
{u\,U2)  €  U  =  [—6  6]  X  [— 7r/12  7r/12]  with  i?/  =  300m, 
Rs  =  100m,  Rd  =  100m  and  Rc  =  40m.  Fig.  12  shows  the 
forest  of  trees  where  a  solution  trajectory  is  found,  visualizing 
the  position  of  the  intruder.  Eight  initial  conditions  are  gener¬ 
ated  and  a  forest  starts  to  grow  until  a  solution  is  found.  Due  to 
the  space  limitation,  we  show  only  the  trajectories  obtained  for 
the  algorithm  with  the  “dynamics-based  selection  of  proximal 
node”.  However,  the  Table  III  shows  the  statistics  obtained  for 
this  example  with  all  the  options.  The  second  column  shows 
the  average  number  of  nodes  used  to  find  a  solution  trajectory 
for  the  intruder  robot  (one  such  trajectory  is  shown  in  Fig.  12). 
The  third  column  shows  the  computation  time  with  different 
options.  The  first  main  point  to  note  from  these  two  columns  is 
that  the  standard  algorithm  takes  four  times  as  long  requiring 


Fig.  10.  Initial  conditions  for  guard  robots  and  intruder.  Each  robot  has  a 
detection  range  within  which  the  intruder  is  detected,  and  capture  ragne 
Rc  within  which  the  intruder  is  captured. 


Fig.  11.  Guarding  scheme  of  the  robots.  Distribute  until  the  intruder  is 
detected  {lefi)  ;  and  pursue  if  the  intmder  is  within  the  detection  range  of  a 
guard  robot  (right). 


three  times  the  number  of  nodes  to  find  the  same  solution. 
The  second  main  point  in  this  example  is  that  adaptive  biasing 
allows  the  most  improvement  in  efficiency.  The  fourth  column 
shows  a  snapshot  of  the  coverage  measure  after  5000  nodes 
have  been  visited  for  all  cases.  The  N/A  is  used  because  it 
is  less  meaningful  to  show  this  number  for  adaptive  sampling 
when  improving  coverage  is  not  the  driving  force  behind  using 
the  adaptive  bias. 


Enhancement 

Method 

No.  of 
Nodes 

Computation 
Time  (sec) 

Coverage 

Measure 

No  Enhancement 

20544 

9020.9 

0.0190 

Dynamics-based 

12960 

3655.6 

0.0196 

History-based 

8176 

3101.8 

0.0246 

Adaptively  biased 

6398 

1791.6 

N/A 

Three  Enhancements 

7520 

2429.2 

N/A 

TABLE  III 

Guard-intruder  Example:  A  comparison  of  the  algorithm,  with 
AND  without  enhancements  AVERAGED  OVER  10  TRIALS  ON  A  3GHZ 

PC. 


V.  Conclusion 

The  RRT  algorithm  has  been  successful  in  solving  complex 
motion  planning  problems.  We  explore  the  application  of  this 
algorithm  and  its  variants  to  the  problem  of  testing  complex 
reactive  control  systems  and  validating  the  effectiveness  of 
multi-agent  controllers.  Testing  and  validation  involve  search¬ 
ing  for  conditions  that  lead  to  system  failure  by  exploring 
all  adversarial  inputs  and  disturbances  for  errant  trajectories. 
Unlike  motion  planning  problems,  the  systems  may  not  be 
controllable  with  respect  to  disturbances  or  adversarial  inputs 
and  the  reachable  set  of  states  is  generally  a  small  subset 


Fig.  12.  The  forest  of  RRTs  with  8  different  initial  conditions.  A  solution 
trajectory  of  the  intruder  is  highlighted  on  the  right. 

of  the  entire  state  space.  Because  of  the  differences  between 
testing  and  motion  planning,  we  propose  three  modifications  to 
the  original  RRT  algorithm.  First,  we  develop  a  new  distance 
function  which  encodes  local  information  about  the  system’s 
dynamics  with  a  first  order  approximation.  Second,  because 
the  reachable  state  space  is  generally  a  small  fraction  of  the 
total  state  space,  we  modify  the  node  selection  strategy  to 
discourage  the  repeated  selection  of  boundary  nodes.  Finally, 
we  propose  a  scheme  for  adaptively  modifying  the  sampling 
probability  distribution  based  on  tree  growth  to  the  specifica¬ 
tion  set.  We  demonstrate  the  application  of  the  algorithm  via 
three  simple  examples  and  one  large  scale  (25  dimensions) 
multi-agent  pursuit-evasion  and  provide  computational  statis¬ 
tics  demonstrating  a  reduction  of  computation  time  by  a  factor 
of  three. 
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